253 research outputs found

    Cloudy, increasingly FAIR; Revisiting the FAIR Data guiding principles for the European Open Science Cloud

    Get PDF
    The FAIR Data Principles propose that all scholarly output should be Findable, Accessible, Interoperable, and Reusable. As a set of guiding principles, expressing only the kinds of behaviours that researchers should expect from contemporary data resources, how the FAIR principles should manifest in reality was largely open to interpretation. As support for the Principles has spread, so has the breadth of these interpretations. In observing this creeping spread of interpretation, several of the original authors felt it was now appropriate to revisit the Principles, to clarify both what FAIRness is, and is not

    Provenance-Centered Dataset of Drug-Drug Interactions

    Get PDF
    Over the years several studies have demonstrated the ability to identify potential drug-drug interactions via data mining from the literature (MEDLINE), electronic health records, public databases (Drugbank), etc. While each one of these approaches is properly statistically validated, they do not take into consideration the overlap between them as one of their decision making variables. In this paper we present LInked Drug-Drug Interactions (LIDDI), a public nanopublication-based RDF dataset with trusty URIs that encompasses some of the most cited prediction methods and sources to provide researchers a resource for leveraging the work of others into their prediction methods. As one of the main issues to overcome the usage of external resources is their mappings between drug names and identifiers used, we also provide the set of mappings we curated to be able to compare the multiple sources we aggregate in our dataset.Comment: In Proceedings of the 14th International Semantic Web Conference (ISWC) 201

    Drinking problems: Mechanisms of macropinosome formation and maturation.

    Get PDF
    Macropinocytosis is a mechanism for the non-specific bulk uptake and internalisation of extracellular fluid. This plays specific and distinct roles in diverse cell types such as macrophages, dendritic cells and neurons, by allowing cells to sample their environment, extract extracellular nutrients and regulate plasma membrane turnover. Macropinocytosis has recently been implicated in several diseases including cancer, neurodegenerative diseases and atherosclerosis. Uptake by macropinocytosis is also exploited by several intracellular pathogens to gain entry into host cells. Both capturing and subsequently processing large volumes of extracellular fluid poses a number of unique challenges for the cell. Macropinosome formation requires co-ordinated three-dimensional manipulation of the cytoskeleton to form shaped protrusions able to entrap extracellular fluid. The following maturation of these large vesicles then involves a complex series of membrane rearrangements to shrink and concentrate their contents, whilst delivering components required for digestion and recycling. Recognition of the diverse importance of macropinocytosis in physiology and disease has prompted a number of recent studies. In this article we summarise advances in our understanding of both macropinosome formation and maturation, and seek to highlight the important unanswered questions. This article is protected by copyright. All rights reserved

    Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology

    Get PDF
    Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance – fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement

    Interoperability and FAIRness through a novel combination of Web technologies

    Get PDF
    Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs

    Assaying Rho GTPase–dependent processes in Dictyostelium discoideum

    Get PDF
    The model organism D. discoideum is well-suited to investigate basic questions of molecular and cell biology, particularly those related to the structure, regulation and dynamics of the cytoskeleton, signal transduction, cell-cell adhesion and development. D. discoideum cells make use of Rho-regulated signaling pathways to reorganize the actin cytoskeleton during chemotaxis, endocytosis and cytokinesis. In this organism the Rho family encompasses 20 members, several belonging to the Rac subfamily, but there are no representatives of the Cdc42 and Rho subfamilies. Here we present protocols suitable for monitoring the actin polymerization response and the activation of Rac upon stimulation of aggregation competent cells with the chemoattractant cAMP, and for monitoring the localization and dynamics of Rac activity in live cells

    Improved general regression network for protein domain boundary prediction

    Get PDF
    Background: Protein domains present some of the most useful information that can be used to understand protein structure and functions. Recent research on protein domain boundary prediction has been mainly based on widely known machine learning techniques, such as Artificial Neural Networks and Support Vector Machines. In this study, we propose a new machine learning model (IGRN) that can achieve accurate and reliable classification, with significantly reduced computations. The IGRN was trained using a PSSM (Position Specific Scoring Matrix), secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence. Results: The proposed model achieved average prediction accuracy of 67% on the Benchmark_2 dataset for domain boundary identification in multi-domains proteins and showed superior predictive performance and generalisation ability among the most widely used neural network models. With the CASP7 benchmark dataset, it also demonstrated comparable performance to existing domain boundary predictors such as DOMpro, DomPred, DomSSEA, DomCut and DomainDiscovery with 70.10% prediction accuracy. Conclusion: The performance of proposed model has been compared favourably to the performance of other existing machine learning based methods as well as widely known domain boundary predictors on two benchmark datasets and excels in the identification of domain boundaries in terms of model bias, generalisation and computational requirements. © 2008 Yoo et al; licensee BioMed Central Ltd
    corecore